Creating data visualisation beyond default.
In this take-home exercise, you are required to apply appropriate interactivity and animation methods you had learned in last week lesson to design an age-sex pyramid based data visualisation to show the changes of demographic structure of Singapore by age cohort and gender between 2000-2020 at planning area level.
For this task, the data sets entitle Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2000-2010 and Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2011-2020 will be used. The data sets are available at Department of Statistics home page.
Before we get started, it is important for us to ensure that the required R packages have been installed. For the purpose of the exercise, the follow packages will be used:
packages = c('ggiraph','plotly',
'DT','gganimate',
'tidyverse')
for(p in packages){
if (!require(p, character.only =T)){
install.packages(p)
}
library(p,character.only =T)
}
The code chunk below imports respopagesextod2000to2010.csv and respopagesextod2011to2020.csv into R environment by using read_csv() function of readr package.
And the bind_rows of dplyr package will be used to combine the two data sets by columns to get the summarized data set to build the age-sex pyramid.
respopagesex1 <- read_csv("data/respopagesextod2000to2010.csv")
respopagesex2 <- read_csv("data/respopagesextod2011to2020.csv")
respopagesex <- bind_rows(respopagesex1, respopagesex2)
For the age-sex pyramid, AG, Time, Sex and Pop will be selected and grouped by AG and Sex. Then these groups will be summarised by the total sum of the Pop of these groups and the data will be ordered by Sex and AG as shown in the code chunk below.
AG_pop <- respopagesex %>%
group_by(`AG`, `Sex`, `Time`) %>%
summarise('Pop'= sum(`Pop`)) %>%
ungroup()
order <- c("0_to_4", "5_to_9", "10_to_14",
"15_to_19", "20_to_24", "25_to_29",
"30_to_34", "35_to_39", "40_to_44",
"45_to_49", "50_to_54", "55_to_59",
"60_to_64", "65_to_69", "70_to_74",
"75_to_79", "80_to_84", "85_to_89",
"90_and_over")
sorted_pop <- AG_pop %>%
mutate(AG = factor(AG, levels = order)) %>%
arrange(AG)
DataTables provides filtering, pagination, sorting, and many other features in the tables. It is useful to show the corresponding data which you selected in the plot to find any insights.
highlight() is a function of plotly package. It sets a variety of options for brushing (i.e., highlighting) multiple plots. These options are primarily designed for linking multiple plotly graphs, and may not behave as expected when linking plotly to another htmlwidget package via crosstalk. In some cases, other htmlwidgets will respect these options, such as persistent selection in leaflet.
bscols() is a helper function of crosstalk package. It makes it easy to put HTML elements side by side. It can be called directly from the console but is especially designed to work in an R Markdown document.
Code chunk below is used to implement the coordinated brushing:
d <- highlight_key(sorted_pop)
p <- ggplot(d, aes(x = ifelse(Sex == "Males", yes = -Pop, no = Pop),
y = AG, fill = Sex)) +
geom_col() +
scale_x_continuous(breaks = seq(-3500000, 3500000, 1000000),
labels = paste0(as.character(c(seq(3500, 0, -1000), seq(500, 3500, 1000))),"K")) +
labs (x = "Population", y = "Age", title='Singapore Age-Sex Population Pyramid 2021') +
theme_bw() +
theme(axis.ticks.y = element_blank()) +
scale_fill_manual(values = c("Males" = "darkblue", "Females" = "pink"))
gg <- highlight(ggplotly(p),
"plotly_selected")
crosstalk::bscols(gg,
DT::datatable(d),
widths = 15)
A population pyramid or “age-sex pyramid” is a graphical illustration of the distribution of a population (typically that of a country or region of the world) by age groups and sex; it typically takes the shape of a pyramid when the population is growing. Males are usually shown on the left and females on the right, and they may be measured in absolute numbers or as a percentage of the total population. The pyramid can be used to visualize the age of a particular population. It is also used in ecology to determine the overall age distribution of a population; an indication of the reproductive capabilities and likelihood of the continuation of a species.
This static population pyramid will be plotted as shown in the code chunk below.
ggplot(sorted_pop, aes(x = ifelse(Sex == "Males", yes = -Pop, no = Pop),
y = AG, fill = Sex)) +
geom_col(alpha = 0.5) +
scale_x_continuous(breaks = seq(-3500000, 3500000, 1000000),
labels = paste0(as.character(c(seq(3500, 0, -1000),
seq(500, 3500, 1000))),"K")) +
theme_bw() +
theme(axis.ticks.y = element_blank()) +
labs(title = 'Singapore Age-Sex Population Pyramid {frame_time}',
x = 'Population',
y = 'Age')
For the task goal to show the changes of demographic structure of Singapore by age cohort and gender between 2000-2020 at planning area level, gganimate will be used by providing a range of new grammar classes that can be added to the plot object in order to customise how it should change with time.
This animated population pyramid will be plotted as shown in the code chunk below.
ggplot(sorted_pop, aes(x = ifelse(Sex == "Males", yes = -Pop, no = Pop),
y = AG, fill = Sex)) +
geom_col(alpha = 0.5) +
scale_x_continuous(breaks = seq(-200000, 200000, 50000),
labels = paste0(as.character(c(seq(200, 0, -50), seq(50, 200, 50))),"K")) +
theme_bw() +
theme(axis.ticks.y = element_blank()) +
labs(title = 'Singapore Age-Sex Population Pyramid {frame_time}',
x = 'Population',
y = 'Age')+
transition_time(as.integer(Time))+
ease_aes()